Picture for Haizhou Li

Haizhou Li

EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis

Add code
Jan 30, 2026
Viaarxiv icon

CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation

Add code
Dec 25, 2025
Viaarxiv icon

ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction

Add code
Nov 09, 2025
Viaarxiv icon

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Add code
Oct 26, 2025
Viaarxiv icon

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Add code
Sep 11, 2025
Viaarxiv icon

NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation

Add code
Sep 04, 2025
Figure 1 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 2 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 3 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 4 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Viaarxiv icon

Interpolating Speaker Identities in Embedding Space for Data Expansion

Add code
Aug 26, 2025
Figure 1 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Figure 2 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Figure 3 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Figure 4 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Viaarxiv icon

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Add code
Aug 20, 2025
Figure 1 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Figure 2 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Figure 3 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Figure 4 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Viaarxiv icon

UniTalker: Conversational Speech-Visual Synthesis

Add code
Aug 06, 2025
Viaarxiv icon

Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

Add code
Jul 23, 2025
Viaarxiv icon